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O ! Abstract 

<N 

We study Boolean functions with sparse Fourier coefficients or small spectral norm, and 
Q_i| their applications to the Log-rank Conjecture for XOR functions f(x(By) — a fairly large class 

-^ of functions including well studied ones such as Equality and Hamming Distance. The rank 

of the communication matrix Mf for such functions is exactly the Fourier sparsity of /. Let 
d = deg 2 (/) be the F2-degree of / and D cc (/ o ©) stand for the deterministic communication 
complexity for f(x(By). We show that 



1. D cc (/ o0) = 0(2 d / 2 log d " 2 H/lli). In particular, the Log-rank conjecture holds for XOR 
functions with constant F2-degree. 

2. D cc (/o0) = 0(d||/||i) = 0(y/ra,nk(M f ) logrank(M/)). This improves the (trivial) linear 



bound by nearly a quadratic factor. 

We obtain our results through a degree- reduction protocol based on a variant of polynomial rank, 
and actually conjecture that the communication cost of our protocol is at most log ^ rank(M/). 
The above bounds are obtained from different analysis for the number of parity queries required 
to reduce /'s F 2 -degree. Our bounds also hold for the parity decision tree complexity of /, a 
measure that is no less than the communication complexity. 

Along the way we also prove several structural results about Boolean functions with small 
(~~) ', Fourier sparsity ||/||o or spectral norm ||/||i, which could be of independent interest. For 

£T) • functions / with constant F2-degree, we show that: 1) / can be written as the summation of 

quasi-polynomially many indicator functions of subspaces with ±-signs, improving the previous 
doubly exponential upper bound by Green and Sanders; 2) being sparse in Fourier domain is 
polynomially equivalent to having a small parity decision tree complexity; and 3) / depends 
only on polylog||/||i linear functions of input variables. For functions / with small spectral 
norm, we show that: 1) there is an affine subspace of co-dimension 0(||/||i) on which fix) is a 
constant, and 2) there is a parity decision tree of depth 0(||/||i log ||/||o)- 
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1 Introduction 

Fourier analysis of Boolean functions. Fourier analysis has been widely used in theoretical 
computer science to study Boolean functions with applications in PCP, property testing, learning, 
circuit complexity, coding theory, social choice theory and many more; see |Q'D12] for a compre- 
hensive survey. The Fourier coefficients of a Boolean function measure the function's correlations 
with parity functions; the distribution as well as various norms of Fourier spectrum have been found 
to be related to many complexity measures of the function. However, another natural measure, 
Fourier sparsity - i.e. the number of non-zero Fourier coefficients - has been much less studied. 
It seems to be of fundamental interest to understand properties of functions that are Boolean in 
the function domain and, at the same time, sparse in the Fourier domain. In particular, what 
Boolean functions have sparse Fourier spectra? Being sparse in the Fourier domain should imply 
that the function is simple, but in which aspects? Gopalan et al. |GOS + ll studied the problem of 



testing Fourier sparsity and low-dimensionality and revealed several interesting structural results 
for Boolean functions having or close to having sparse Fourier spectra. In a related setting, Green 
and Sanders |GS08] showed that Boolean functions with a small spectral norm (i.e. the ^i-norm 
of the Fourier spectrum) can be decomposed into a small number of signed indicator functions of 
subspaces. However, the number of subspaces in their bound is doubly exponential in terms of the 
function's spectral norm, thus makes their result hard to apply in many computer science related 
problems. 

The Log-rank Conjecture in communication complexity. In a different vein, Fourier spar- 
sity also naturally arises in the study of Log-rank Conjecture in communication complexity. Com- 
munication complexity quantifies the minimum amount of communication needed for computation 
on inputs distributed to different parties [Yao79, KN97J. In a standard scenario, two parties Alice 
and Bob each hold an input x and y, respectively, and they desire to compute a function / on input 
(x,y) by as little communication as possible. Apart from its own interest as a question about dis- 
tributed computation, communication complexity has also found numerous applications in proving 
lower bounds in complexity theory, as well as connections to linear algebra, graph theory, etc. 

Of particular interest are lower bounds of communication complexity, and one of the most widely 
used methods is based on the rank of the communication matrix Mf = [f(x,y)] x , y ', see [LS09] for 
an extensive survey on classical and quantum lower bounds proved by rank and its variations (such 
as the approximate rank and its equivalence 72-norm). Since it was shown 30 years ago [MS82] that 
logrank(Mj) is a lower bound of the deterministic communication complexity D cc (/), the tightness 
of the lower bound has long been an important open question. The Log-rank Conjecture, proposed 
by Lovasz and Saks [LS88J, asserts that the lower bound is polynomially tight for all total Boolean 
functions / - namely D c (/) < log c rank(Mj) for some absolute constant c. As one of the most 
important problems in communication complexity, the conjecture links communication complexity 
- a combinatorially defined quantity, to matrix rank - a much better understood measure in linear 
algebra. Should the conjecture hold, understanding the communication complexity is more or less 
reduced to a usually much easier task of calculating matrix ranks. The conjecture is also known to 
be equivalent to many other conjectures [LS881 ILov901 IVal041lASTS + 03] . 

Despite its importance, Log-rank Conjecture is also notoriously hard to attack. Nisan and 
Wigderson [NW95J showed that to prove the conjecture, it is sufficient to show a seemingly weaker 
statement about the existence of a large monochromatic rectangle. In the same paper, they also 



exhibited an example / for which logrank(Afy) = 0(D cc (f) a ) where a = log 3 2 = 0.63..., later 
improved by Kushilevitz to a = log 6 3 = 0.61... (also in |NW95j ). The best upper bound for 
the D cc (/) in terms of rank is D cc (/) < (log |)rank(My) [KL961 lKot97j . Recently, assuming the 
Polynomial Freiman-Ruzsa conjecture in additive combinatorics, Ben-Sasson, Lovett and Ron-Zewi 
gave in [BSLRZ12] a better upper bound D(/) < 0(rank(M / )/logrank(M / )). 

Communication complexity of XOR functions. In view of the difficulty of the Log-rank 
Conjecture in its full generality, Shi and Zhang [ZS10J initiated the study of communication com- 
plexity of a special class of functions called XOR functions. 

Definition 1. We say F(x,y) : {0, l} n x {0, 1}™ — > {0, 1} is an XOR function if there exists an 
f : {0, l} n —7- {0, 1} such that for all x and y in {0, 1}" , F(x, y) = f{x © y), where © is the bit-wise 
XOR. Denote F by f o ©. 

XOR functions include important examples such as Equality and Hamming Distance, and the 
communication complexity of XOR functions has recently drawn an increasing amount of attention 
[ZS091 IZSlUl iLZlOl IMO101 ILLZ111 ISW121ILZT3] . In general, the additional symmetry in the com- 
munication matrix Mp should make Log-rank Conjecture easier for XOR functions. In particular, 
a very nice feature of XOR functions is that the rank of the communication matrix Mp is exactly 
the Fourier sparsity of /, the number of nonzero Fourier coefficients /. 

Proposition 1 ( |BC99j ). For XOR functions F(x,y) = f(x@y), it holds that rank(M^) = ||/|| . 

Therefore the Log-rank Conjecture for XOR functions is equivalent to the question that whether 
every Fourier sparsqj function / admits an efficient communication protocol to compute f(x © y), 
or more specifically, whether D (/ o ©) < log ^ ' ||/||o holds for every Boolean function /? 

However, the Log-rank conjecture seems still very difficult to study even for this special class 
of functions. The only previously known results are that the Log-rank Conjecture for XOR func- 
tions holds for all symmetric functions [ZS09] , monotone functions and linear threshold functions 
(LTFs) [MO 10] . and AC functions; see Section [L2l for more details. One nice approach proposed 
in [ZS10] is to first design an efficient parity decision tree (PDT) for computing /, and then to sim- 
ulate it by a communication protocol. Parity decision trees allow querying the parity of any subset 
of input variables (instead of just one input variable as in usual decision trees). A communication 
protocol can exchange two bits £(x) and £(y) (here £(■) is an arbitrary linear function) to simulate 
one query t(x © y) in a PDT, thus D (/ o ©) is at most twice of D^(/), the parity decision tree 
complexity of /. It is therefore sufficient to show that D ffi (/) < log '^ ||/||o for all / to prove the 
Log-rank Conjecture for XOR functions Parity decision tree complexity is an interesting complex- 
ity on its own, with connections to learning [KM93^ and other parity complexity measures such as 
parity certificate complexity and parity block sensitivity [ZS10] . This approach is also appealing 
for the purpose of understanding Boolean functions with sparse Fourier spectra. It is not hard to 
see that small D®(/) implies Fourier sparsity; now if D®{f) < log ' ^ ||/||o is true, then functions 
with small Fourier sparsity also have short parity decision trees. Thus the elusive property of being 
Fourier sparse is roughly equivalent to the combinatorial and computational property of having 
small PDT. 



1 Note that if the Fourier sparsity of / is large, say 2™ , then Log-rank Conjecture is vacuously true for /, as 
the communication complexity of any function is at most 0(n). 



Back to the Log-rank conjecture, though upper bounds for D ffi (/) translate to efficient protocols 
for D (/ o ©), the task of designing efficient PDT algorithms itself does not seem to be an easy 
task. To see this, let us examine the effect of parity queries. Each query "i • x =?" basically 
generates two subfunctions through restriction, and its effect on the Fourier domain can be shown 
to be fb(s) = f(s) + (— l) b f(s + t), where ft, is the subfunction obtained from restricting / on 
the half space {x : t ■ x = b}. Thus the process is like to fold the spectrum of / along the line t, 
and we hope that the folding has many "collisions" in nonzero Fourier coefficients, namely many 
s £ supp(f_), with s + t £ supp(/) as well. In general, small D ffi (/) implies that many Fourier 
coefficients^ are "well aligned" with respect to a subspace V with a small co-dimension, so that 
querying basis of V make those Fourier coefficients collide. But the question is — Where is the 
subspace? 

Note that D§(f) is invariant under change of input basis, thus one tempting way to upper 
bound Dq(/) is to first rotate input basis, and then (under the new basis) use the well-known 
fact that the standard decision tree complexity D(/) is at most 0(deg(/) 4 ), where the deg(/) is 
the (Fourier) degree (max s:/{s) ^ \s\) of / [BdW02| . Thus if deg(/) = log°W ||/|| , then D©(/) < 

D(/) < log ' ' H/llo- However, one should also note that this approach cannot handle all the Fourier 
sparse functions because, as shown in [ZS10], there exists a functions / such that D®(/) < log 2 n + 4 
but D(/) > n/4, the latter holds even under an arbitrary basis change (i.e. min^ D(Lf) > n/A 
where Lf(x) = f(Lx)). 

1.1 Our approach, ideas, and results 

Result 1: Main protocol and general conjecture. In previous studies of parity decision tree, 
one needs to upper bound the number of queries for all possible execution paths. In this paper, 
we show that it suffices to prove the existence of one short path! To put this into context, we need 
the concept of polynomial rank. View a Boolean function / : {0, l} n —> {0, 1} as a polynomial in 
¥2[xi, ..., x n ]. Call the degree of this polynomial the F2-degree, denoted as deg 2 (/). The polynomial 
rank of / is the minimum number r s.t. f can be written as 

f=£lfl + ---+£rfr + fo, (1) 

where each £i is a linear function in x and each /j is a function of F2-degree at most deg 2 (/) — 1. 
Now we will describe a simple PDT algorithm: query all £i(x) and get answers a^, and we then face 
a new function /' = J2i=i a ifi + /o' s - Recurse on this function. Note that from / to /', the F 2 - 
degree is reduced by at least 1, and one can also show that the Fourier sparsity of /' is also at most 
that of /. Finally, it is known that d < log ||/||o- Putting these nice properties together, we know 
that as long as the polynomial rank of an arbitrary function / is upper bounded by log ' ' ||/||o, 
so is D®(/). 

Conjecture 2. For all f : {0, l} n ->■ {0, 1}, we have rank(/) = O(log c (||/|| )) for some c = 0(1). 

Theorem 3. If Conjecture^ is true, then 

1. All Boolean functions with small ||/||o have small parity decision tree complexity as well: 

D e (/) = 0(log c+1 (||/||o)). 

2. The Log-rank Conjecture is true for all XOR functions: D cc (/ o ®) = O(21og c+1 (||/||o)). 



Technically, we mean characters with the corresponding Fourier coefficients being nonzero. 



Result 2: low degree polynomials. Next we focus on upper bounding the polynomial rank, 
starting from small degrees. For degree-2 polynomials, the classic theorem by Dickson implies that 
rank(/) = 0(log ||/||o)- For degree-3 polynomials, Haramaty and Shpilka proved in [HS10] that 
rank(/) = 0(log 2 (l/||/|| c/ 3)) = 0(log 2 (l/bias(/))). By a proper shift, we can make bias(/) > 

1/a/||/||o and thus get rank(/) = 0(log ||/||o)- For degree-4 polynomials, however, the bound in 
[HS1Q] is exponentially worse, and there were no results for higher degrees. A natural question 
is: Can one prove the rank(/) = 0(log ' ' ||/||o) for degree-4 polynomials? Further, if it is too 
challenging to prove rank(/) < log *■ ' ||/||o for general degree d (which is at most log ||/||o)> can 
one prove it for constant-degree polynomials (even if the power O(l) is a tower of 2's of height d)l 
In this paper, we show that this is indeed achievable. Actually, we can even replace the £o- norm 
by ^i-norno of / in the bound, and the dependence on d is "only" singly exponential. 

Lemma 4. For all Boolean functions f with ¥2-degree d, we have rank(/) = 0(2 d I 2 log ~ 2 ||/||i). 

The lemma immediately implies the following two results. 
Theorem 5. If f is a Boolean function of constant ¥2- degree, then D cc (/o©) < log ' ' (rank(M^ offi )). 

Recursively expanding Eq.([IJ) and applying the bound on ranks in Lemma [4] gives that 

Corollary 6. Every Boolean function f of^^-degree d depends only on 0(2 d < 2 log ||/||i) linear 
functions of input variables. 

Another corollary is the following. Green and Sanders proved that any / : {0, l} n —> {0, 1} can 

be written as / = Yli=i^ z ^-V i , where T = 2 2 1 and each \y i is the indicator function of the 
subspace V{. For constant degree polynomials, we can improve their doubly-exponential bound to 
quasi-p olynomial . 

Corollary 7. If f : {0, l} n — > {0,1} has constant ¥ 2 -degree, then f = ^^±1^ where T = 
2 lo s ll/lli and each \y i is the indicator function of the subspace V{. 

The proof of Lemma 0] follows the general approach laid out in the Main protocol, i.e., a 
rank-based degree-reduction process, with several additional twists. First, to find a "good" affine 
subspace restricted on which / becomes a lower degree polynomial, we recursively apply the deriva- 
tives of / to guide our search. Second, even though our final goal is to reduce the degree of /, we 
actually achieve this through reducing the spectral norm of /. This is done by studying the effect 
of restriction on two non-Boolean functions. Last, in the induction step, we in fact need to prove a 
stronger statement about a chain inequality involving rank, minimum parity 0-certificate complex- 
ity C^ min , minimum parity 1-certificate complexity C^ min and parity decision tree complexity D$. 
And the induction is used in a "cyclic" way: we upper bound min{C e min , C e min } by induction on 
max{C ffi min , C ffi min }, which upper bounds rank. This can then be used to show that D ffi is small, 
which in turn upper bounds maxjCg min , C^ min } to finish the inductive step. 



3 Strictly speaking, in view of the corner case of ||/|ji = 1, one should replace log(||/||i) by log(||/||i + 1). But like 
in most previous papers, we omit the "+1" term for all ||/||i in this paper for simplicity of notation. 



Result 3: functions with small spectral norms. While Theorem [5] handles the low-degree 
case, the bound deteriorates exponentially with the F2-degree. Via a different approach, we are 
able to upper bound rank(/) by the £i-norm of /. 

Lemma 8. For all f : {0, l} n -> {0, 1}, we have rank(/) < 0(||/||i). 

In fact we prove a slightly stronger result that there exists an affine subspace of codimension 
at most 0(||/||i) on which / is constant. In other words, if a Boolean function has small spectral 
norm then it is constant on a large affine subspace. 

The proof of the lemma uses a greedy algorithm that always makes the two largest Fourier 
coefficients to collide (with the same sign). Exploiting the property that / is Boolean, one can show 
that this greedy folding either significantly increases the largest Fourier coefficient, or decreases the 
||/||i by a constant. 

The lemma immediately implies the following result for general (not necessarily XOR) functions. 

Theorem 9. For all f : {0, l} m x {0, l} n -> {0, 1} ; we have D cc (/) < 2D e (/) = 0(deg 2 (/) • ||/||i). 

In |Gro97j . Grolmusz gave a public-coin randomized protocol with communication cost 0(||/||i)- 
The above theorem gives a deterministic protocol, and the bound is better for functions / with 

deg 2 (/) = o(||/||i). 

Another implication of Lemma [8] is that the communication complexity of / o © is at most the 
square root of the matrix rank. 

Theorem 10. For all f : {0, l} n -> {0, 1}, 



D LL (/ o ©) = 0(deg 2 (/) • H/lli) = 0(yrank(M /oe )logrank(M /o6 

The upper bound of rank/ log rank in [BSLRZ12] improves the trivial linear bound by a log 
factor for all Boolean functions, assuming the Polynomial Freiman-Ruzsa conjecture. In compari- 
son, our bound of v / rank log rank is only for XOR functions, but it improves the linear bound by 
a polynomial factor, and it is unconditional. 

It is also interesting to note that, for any fixed Boolean function /, at least one of the above two 
theorems gives a desirable result: either ||/||i > log ' ||/||o where k is a big constant, then Theorem 
[9]improves Grolmusz's bound almost quadratically (since deg 2 (/) < log ||/||o); or ||/||i < log ||/||o> 
then Theorem 1101 confirms the Log-rank conjecture for / o ©! 

Result 4: functions with a light Fourier tail. Our last result deals with Boolean functions 
whose Fourier spectrum has a light tail. We call a function / : {0, l} n — > {+1,-1} /U-close to 
s-sparse in £2 if ^2 i>s f{s-i) 2 < /^ 2 , where |/(si)| > ... > |/(sjv)|- We say two functions f,g : 
{0, l} n ->• {+1, -1} are e-close if Pr x [f(x) / g{x)} < e. 

Theorem 11. If f : {0,1}" — > {+1,-1} is [i-close to s-sparse in £2, where fj, < -2S^-_ Miifi. and 

vll/llo 

a < log ^) ll/llo, then D©(/) < log ^) ||/|| . 

The proof of this theorem uses Chang's lemma about large Fourier coefficients of low-density 
functions, and a "rounding" lemma from |GOS + ll . 



1.2 Related work 

The Log-rank Conjecture for XOR functions was shown to be true for symmetric functions [ZS09J, 
linear threshold functions (LTFs), monotone functions |MO10j . and AC functions [KS13J . These 
results fall into two categories. The first one, including symmetric functions and LTFs, is that 
the rank of the communication matrix (i.e. the Fourier sparsity) is so large, that the Log-rank 
conjecture trivially holds. The second one, including monotone functions and AC functions, is 
that even the Fourier degree is small, thus the standard decision tree complexity D(/) is already 
upper bounded by the poly-logarithmic of the matrix rank. But as we mentioned, there are functions 
that have small Fourier sparsity and high Fourier degree (even after basis rotation), which form the 
hardcore cases of the problem. Our study makes crucial use the fact that parity queries are more 
powerful than single input-variable queries, and our results reveal structural properties of Fourier 
spectra. 

In [HSJTJ] . Haramaty and Shpilka proved that rank(/) = 0(log 2 (l/||/|| c/3 )) = 0(log 2 (l/bias(/))) 
for degree-3 polynomials. For degree-4 polynomials, however, the bound gets exponentially worse, 
and there were no results for higher degrees. In comparison, our Lemma 2] gives a polylog upper 
bound for rank(/) of all constant degree functions /, but the polylog is in ||/||i rather than in 
bias(/) or Gower's norm ( [Gow98l iGowOTl lAKK+05] ). 



Though Boolean functions with a sparse Fourier spectrum seem to be a very interesting class 



of functions to study, not many properties are known. It is shown in GOS + ll that the Fourier 
coefficients of a Fourier sparse function have large "granularity" and functions that are very close 
to Fourier sparse can be transformed into one through a "rounding off" procedure. Furthermore, 
they proved that one can use 2 log ||/||o random linear functions to partition the character space so 
that, with high probability, each bucket contains at most one nonzero Fourier coefficient. This does 
not help our problem since what we need is exactly the opposite: to group Fourier coefficients into 
buckets so that a small number of foldings would make many of them to collide (and thus reducing 
the Fourier sparsity quickly). 

Let A = supp(/) be the support of /'s Fourier spectrum. One way of designing the parity query 
is to look for a "heavy hitter" t of set A + A, i.e. t with many s±, 82 E A and s\ + S2 = t. If such t 
exists, then querying the linear function (t,x) reduces the Fourier sparsity a lot. One natural way 
to show the existence of a heavy hitter is by proving that \A + A\ is small. Turning this around, 
one may hope to show that if it is large, then the function is not Fourier sparse or has some special 
properties to be used. The size of \A + A\ has been extensively studied in additive combinatorics, 
but it seems that all related studies are concerned with the low-end case, in which \A + A\ < k\A\ 
for very small (usually constant) k. Thus those results do not apply to our question. 

There are actually two variants of polynomial rank. One is what we mentioned earlier and used 
in this work, and the other, which is actually much better studied, is the minimum r s.t. f can be 
expressed as a function F of r lower degree polynomials fi,...,f r . A nice result for this definition 
of rank is that large bias implies low polynomial rank [GT091 IKL08] : the rank is a function of the 
bias and degree only, but not of the input size n. This is, however, insufficient for us because a 
Fourier sparse function may have very small bias. Furthermore, the dependence of the rank on the 
degree is a very rapidly growing function (faster than a tower of 2's of height d), while our protocol 
has "only" single exponential dependence of d. 



The 'work of [SV13] . After completing this work independently, the very recent work [SV13] 
came to our attention, which studies PDT complexity of functions with small spectral norm. The 
authors show C® im in(/) < 0(||/||f) and D®(/) = 0(||/||f log ||/||o). In comparison, our Lemma 1301 
and Theorem [10] are at least quadratically better. The paper |SV13j also studies the size of PDT 
and shows that ®-size(/) < n (\\f\W\ and considers approximation of Boolean functions, which is 
not studied in this paper. 

2 Preliminaries and notation 

All the logarithms in this paper are base 2. For two n-bit vectors s,t £ {0, l} n , define their inner 
product as s ■ t = (s,t) = Y^l=i s ^ mod 2 and for simplicity we write s + t for s © t. Throughout 
the paper, logarithm is base 2. We often use / to denote a real function defined on {0, l} n . In most 
occurrences / is a Boolean function, whose range can be represented by either {0, 1} or {+1, —1}, 
and we will specify whenever needed. For / : {0, l} n — > {0,1}, we define / = 1 — 2/ to convert 
the range to {+1, —1}- For each b G range(/), the 6-density of / is pi, = \f~ 1 (b)\/2 n . 

Each Boolean function / : {0, 1}™ — > {0, 1} can be viewed as a polynomial over F2, and we use 
deg 2 (/) to denote the F2-degree of /. For a Boolean function / : {0, l} n — > {0, 1} and a direction 
vector t £ {0, l} n — {0™}, its derivative A t f is defined by A t f(x) = f(x) + f(x + t). It is easy to 
check that deg 2 (A t /) < deg 2 (/) for any non-constant / and any t. 

Complexity measures. A parity decision tree (PDT) for a function / : {0, 1}" — > {0, 1} is a 
tree with each internal node associated with a linear function £(x), and each leaf associated with 
an answer a £ {0, 1}. When we use a parity decision tree to compute a function /, we start from 
the root and follow a path down to a leaf. At each internal node, we query the associated linear 
function, and follow the branch according to the answer to the query. When reaching a leaf, we 
output the associated answer. The parity decision tree computes / if on any input x, we always 
get the output equal to f(x). The deterministic parity decision tree complexity of /, denoted by 
D®(/), is the least number of queries needed on a worst-case input by a PDT that computes /. 
For a Boolean function / and an input x, the parity certificate complexity of / on x is 

C®(/, x) = min{co-dim(ff) : x £ H, H is an affine subspace, on which / is constant}. 

The parity certificate complexity C®(/) of / is max^ C©(/, x). Since for each x and each parity 
decision tree T, the leaf that x belongs to corresponds to an affine subspace of co-dimension equal to 
the length of the path from it to the root, we have that C®(/) < D®(/) [ZS1QJ. We can also study 
the minimum parity certificate complexities C ffimin (/) = min^.y^w^ C® (/, x) and C® ]m i n (/) = 
min x C®(/,x). 

Denote by D cc (i ? ) the deterministic communication complexity of F. One way of designing 
communication protocols is to simulate a decision tree algorithm, and the following is an adapted 
variant of a well known relation between deterministic communication complexity and decision tree 
complexity to the setting of XOR functions and parity decision trees. 

Fact 12. D cc (/o©) <2D®(/). 



Fourier analysis 

For any real function / : {0, l} n — > R, the Fourier coefficients are defined by f(s) = 2~ n ^2 X f(x)xs(x) 
where Xs{x) = (— l) s ' x . The function / can be written as / = ^2 S f(s)Xs- The ^ p -norm of / for 
any p > 0, denoted by ||/|| p , is defined as (J2 S |/(s)| p ) • The Fourier sparsity of /, denoted by 
H/llo, is the number of nonzero Fourier coefficients of /. The following is a simple consequence of 
Cauchy-Schwarz inequality 

' ' (2) 



Note that ||/||i can be much smaller than ||/||o- For instance, the AND function has ||/||i < 3 
but (I/Ho = 2™. The Fourier coefficients of / : {0, l} n -> {0,1} and /± are related by /±(s) = 
<5 Si o n — 2/(s), where 5 X) y is the Kronecker delta function. Therefore we have 

2||/||i - 1 < llPlli < 2H/II! + 1, and ||/|| - 1 < ||/±|| < ||/||o + 1. (3) 

For any function / : {0, l} n — > R, Parseval's Indentity says that ^2 S fg = E x [f(x) 2 ]. When 
the range of / is {0,1}, then ^2 s fg = E x [/(x)]. We sometimes use / to denote the vector of 
{/(*):*€ {0,1}"}. 

Proposition 13 (Convolution). For two functions f,g : {0, l} n — > R, the Fourier spectrum of fg 
is given by the following formula: fg(s) = ^2 t f(t)g(s + t). 

Using this proposition, one can characterize the Fourier coefficients of Boolean functions as 
follows. 

Proposition 14. A function f : {0, l} n — > R has range {+1, —1} if and only if 

]T / 2 (t) = l, and ]T /(*)/(* + t) = 0, Vs€{0,l}"-0 n . 

te{o,i} n te{o,i} n 

Another fact easily follows from the convolution formula is the following. 
Lemma 15. Let f,g : {0, 1}" -> M, then \\Tg\\o < \\f\\o\\g\\o and \\Tg\\i < \\f\\i\\g\\i- 

Linear maps and restrictions. Sometimes we need to rotate the input space: For an invertible 
linear map L on {0, l} n , define Lf by Lf(x) = f{Lx). It is not hard to see that deg 2 (L/) = deg 2 (/), 
and that Lf\s) = f{{L T y 1 s). Thus 

\\Lf ||i = H/lk and ||L?||o = ||/|| . (4) 

For a function / : {0, l} ra — > R, define two subfunctions /o and /i, both on {0, l} n_1 : fb(x2, • • • , x n ) 
f(b,X2, • • • , x n ). It is easy to see that for any s G {0, l} n_1 , h( s ) = /(0 s ) + ( — 1) J (l s )> thus 

H/tllo < H/llo and HAlIx < II/Hl (5) 

The concept of subfunctions can be generalized to general directions. Suppose / : {0, l} n — > R 
and S 1 C {0, l} n is a subset of the domain. Then the restriction of / on S, denoted by f\s is the 
function from S to R defined naturally by f\s{x) = f(x), Vx G 5. In this paper, we are concerned 
with restrictions on affine subspaces. 



Lemma 16. Suppose f : {0, l} n —> R and H = a + V is an affine subspace, then one can define 
the spectrum f\jj of the restricted function f\jj such that 

1. If co-dim(^f) = 1, then f\jj is the collection of f(s) + (— l) b f(s + t) for all unordered pair 
(s, s + t), where t is the unique non-zero vector orthogonal to V, and b = if a 6 V and 6 = 1 
otherwise. Sometimes we refer to such restriction as a folding over t. 

2. \\J\h\\ p <\\f\\ P , for any pe [0,1]. 

3. If range(/) = {+1,-1}, then the following three statements are equivalent: 1) f\n{x) = 
c Xs (x) for some s £ {0, 1}" and c £ {+1, -1} ; 2) ||/|#||o = 1, and 3) \\f\ H \\i = 1. 

See Appendix [X] for a proof. In the proof, we use a rotation R as an isomorphism from H 
(which may not be an group under addition any more) to the additive group of {0, l}™ -1 (when 
co-dim(H) = 1). Though the rotation is not unique, the resulting Fourier vector f\n is the same 
up to a linear invertible transform, thus the norm ||/|#|| p does not depend on the rotation. In 
addition, the F2-degree of the subfunction f\jj does not depend on the rotation R, thus we will just 
define deg 2 (/|#) to be the deg 2 (/&) where /& is the newly defined subfunctions. 

Using the above lemma, it is not hard to prove by induction the following fact, which says that 
short PDT gives Fourier sparsity. 

Proposition 17. For all f : {0, l} n -)• {0,1}, ||/|| < 4 D ®^). 

The following theorem [BC99J says that the F2-degree can be bounded from above by logarithm 
of Fourier sparsity. 

Fact 18 ( [BU99] ). For all f : {0, l} n -*• {0, 1}, it holds that deg 2 (/) < log ||/|| . 

3 Polynomial rank and the Main PDT algorithm 

The following notion of polynomial rank has been studied in JDic58j for degree-2 polynomials and 
in [HSIOj for degree- 3 polynomials o 

Definition 2. The polynomial rank of a function f £ ¥2[x\, . . . ,x n ], denoted rank(/), is the 
minimum integer r s.t. f can be expressed as 

f = hfl + ...+£rfr + fo, 

where deg 2 (ii) = 1 for all 1 < i < r and deg 2 (/j) < deg 2 (/) for all < i < r. Sometimes we 
emphasize the degree by writing the polynomial rank as rankrf(/) with d = deg 2 (/). 

Recall that a parity certificate is an affine subspace H restricted on which / is a constant. 
The parity certificate complexity is the largest co-dimension of such H. The next lemma says 
that the polynomial rank is quite small compared to the parity certificate complexity, even if we 
merely require / to have a lower F 2 -degree (rather than be constant) on the affine subspace; and 
in addition, even if we take the minimum co-dimension over all such H. 



4 Degree-4 polynomials was also studied in HSl5], but the rank is slightly different there as they allow some 
summands to be product of two quadratic polynomials. 



Lemma 19. For all non-constant f : {0, 1}" — > {0, 1} ; the following properties hold. 

1. There is a subspace V of co-dimension r = rank(/) s.t. when restricted to each of the 2 r 
affine subspaces a + V , f has ¥ 2 -degree at most deg 2 (/) — 1. 

2. For all affine subspaces H with co-dim(H) < rank(/), deg 2 (/|#) = deg 2 (/). 
Proof. Fix a non-constant function / : {0, 1}™ —> {0, 1}. 

1. Suppose / = iifi + . . . + £ r f r + /o, where r = rank(/). Let V = {x : £i{x) = 0, Vi G [r]}. 
Then the conclusion holds by the definition of polynomial rank. 

2. Suppose H = a+V where V is a subspace. Let d = deg 2 (/), r = rank(/) and k = co-dim(V). 
Let s be the smallest integer s.t. 

f(x) = h(x)h(e 2 (x), ■ ■ ■ ,£ n (x)) + ...+£ s (x)f s (£ s+1 (x), ...,£ n (x)) + fo(£ s+ i(x), ...,£ n (x)) (6) 

for some linear functions £i(x), . . . ,£k(x) s.t. when viewed as vectors, span{£\, ...,£k} = V ± , 
and some functions fa whose F 2 -degree are all strictly smaller than d. By the definition of 
the polynomial rank, we have that r < s. Since we assumed that k < r, it holds that k < s. 
Consider the function 



/ (h+2, ■ ■ ■ An) = h+lfk+l(h+2, ■■■ ,£n) H +4/s(4+l, • •• ,Ai) + /o( 



'S+X-i • • • ) tn) 



Since k < s, there is at least one term other than the last fo(l s +i, ■ ■ ■ ,l n )- Now that s is 
minimized, the function /' has F 2 -degree equal to d, because otherwise /' can be written as 
just one deg-(d— 1) function, thus the number of terms in Eq.([6|) can be reduced. Furthermore, 
we claim that / restricted on the affine subspace V+a has F 2 -degree equal to d as well. Indeed, 
the first k terms in Eq.([6]) give £\J\{t%, ...,£ n ) + . . . + £kfk(^k+i, •••,^n) J which has F 2 -degree at 
most d—1 after £\, ..., £k take specific values given by a. Thus it cannot cancel any degree-ci 
monomial in /'. So / on V + a has F 2 -degree d. 

D 

Lemma [T9l though seemingly simple, is of fundamental importance to our problem as well as 
PDT algorithm designing in general. Note that the second part of Lemma[19]says that, if there exists 
an affine subspace V + a of co-dimension k and a vector a S V 1 - such that deg 2 (/|y+a) < deg 2 (/), 
then rank(/) < k. Therefore Lemma [19] reduces the challenging task of lowering the degree of 
f\v+a for all a to lowering it for just one a. 

In the next two sections, what we are going to use is the following corollary of it. 

Corollary 20. For all non-constant f : {0, l} n —> {0, 1}, we have rank(/) < C^, >m - m (f). 

Proof. This immediately follows from the second item of Lemma [T9l because C© )In i n (/) requires 
deg 2 (/|.ff) = 0, strictly smaller than deg 2 (/) for non-constant /. □ 
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3.1 Main PDT algorithm 

Now we describe the main algorithm for computing function /, by reducing the F2-degree of /. 



Main PDT Algorithm 






Input: An PDT oracle for x 






Output: f(x). 






1. while deg 2 (/) > 1 do 






(a) Take a fixed decomposition f = iifi + ■ 


• • + £ r f r + /o, where r - 


= rankrf(/) with 


d = deg 2 (f). 






(b) for i = 1 to r 






(c) Query £{(x) and get answer a^. 






(d) Update the function 






/:=oi/i + -- 


■ + a r f r + /o 





To analyze the query complexity of this algorithm, we need to bound rank(/). We conjecture 
that the following is true for all Fourier sparse Boolean functions. 

Conjecture 21. For all Boolean functions f : {0, l} n — > {0, 1}, rank(/) = O(log c (||/||o)) for some 
c = 0(l). 

Call a complexity measure M{f) downward non-increasing if M(f') < M(f) for any / and any 
subfunction /' of /. As mentioned earlier (Lemma [T6j) . M(f) = \\f\\o and M(f) = \\f\\i are all 
downward non- increasing complexity measures. 

Theorem 22. The Main PDT algorithm computes f(x) correctly. If rank(/) < M(f) for some 
downward non-increasing complexity measure M, then D^(f) < deg 2 (/)M(/) and D cc (/ o ©) < 
2 log H/llo • M(f). In particular, if Conjecture{21\ is true, then the Log-rank conjecture holds for all 
XOR functions. 

Proof. The correctness is obvious. For the query cost, there are at most deg 2 (/) rounds since each 
round reduces the F 2 -degree by at least one. To avoid confusion, denote the original function by / 
and the function in the iteration t by /'*'. Note that f^' is obtained from / by a sequence of linear 
restrictions, it is a subfunction of /. Each iteration t takes rank(/) queries. If rank(/) < M(f), then 
in particular rank(/'*') < M(/) since M is a downward non-increasing complexity measure. Taking 
all iterations together, the total number of queries is at most deg 2 (/)M(/). The communication 
complexity D (/ o ©) < 2 log ||/||o • M(f) follows from the standard simulation result (Fact fT2j) 
and the degree bound (Fact [T8|) . 

If Conjecture [2T]is true, then the measure M(f) is replaced by log c (||/||o), and thus the above 
bound becomes D cc (/ o ©) < 21og c+ (||/||o)- Namely the Log-rank conjecture holds for all XOR 
functions. □ 

The Main PDT algorithm, though simple, crucially uses the fact that restrictions do not increase 
the Fourier sparsity and uses the F 2 -degree as a progress measure to govern the efficiency Since 
^ e §2(/) — l°g(ll/llo)> the algorithms finishes in a small number of rounds. 

This algorithm also gives a unified way to construct parity decision tree, reducing the task of 
designing PDT algorithms to showing that the polynomial rank is small. Indeed, the results in the 
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next two sections are obtained by bounding rank, where sometimes Theorem [22] will be applied 
with the complexity measure ||/||i. 

Note that if the conjecture D®(/) < log c ||/||o is true, then the Main PDT algorithm always 
gives the optimal query cost up to a polynomial of power c + 1. 

4 Functions with low F2-degree 

In this section, we mainly show that the Log-rank conjecture holds for XOR functions with constant 
F2-degree. We will actually prove 

C e , min (/) = 0(2 rf2 /2 log rf-2|| / -|| i)) 

which is stronger than Lemma [U Theorem [5] then follows from the PDT algorithm and the sim- 
ulation protocol for PDT (Theorem [221) . Corollary [6] is also easily proven by an induction on 
F2-degree. 

The case for degree 1 (linear functions) is trivial and the case for degree 2 (quadratic polyno- 
mials) is also simple due to the following Dickson's theorem. 

Theorem 23 ([Dic58]). Let A G {0, l} nxn be a symmetric matrix whose diagonal entries are all 
0, and define a polynomial f(x) = x T Qx + £(x) + e, where Q is the upper triangle part of A. Then 
rank(/) is equal to the rank of matrix A over ¥2. 

Note that Dickson's theorem says that, up to an afhne (invertible) linear map, the Fourier 
spectrum of a degree 2 polynomial is identical to a bent function on k variables f(x) = x\X2 + • • • + 
x k-iXk, where k = rank(^4). Note that C® jm i n (/) < k/2 because we can simply fix x\ = X3 = . . . = 
Xk-i = and get a 0-constant function. It is also easily seen that this bent function has spectral 
norm 2 fc / 2 , it follows that rank(/) < C e ,min(/) = OQog ||/||i). 

4.1 Cubic polynomials 

We prove Theorem [5] for the special case of cubic polynomials first. This is because degree 3 is 
the first non-trivial case and we use this result in our final induction proof of Theorem [5j more 
importantly, the proof applies some ideas from [HSlOj which inspire our proof for the general 
constant degree case. 

In [HS10], it was shown that for polynomials with F2-degree 3, rank(/) = 0(log (l/bias(/))). 



By shifting the Fourier spectrum appropriately, we can make bias(/) > l/\/||/||o an d thus get 
rank(/) = 0(log ||/||o)- Next, we show that actually the bound can be improved to rank(/) = 
0(log H/llo)- This will also be used for the general degree case. 

We need a lemma that relates the rank of a cubic polynomial and the ranks of its derivatives. 
We call a function linear if its F2-degree is at most 1, and quadratic if its F2-degree is at most 2. 
The following statement is slightly more general than Lemma 3.7 in [HS10], but the same proof 
goes through. 

Lemma 24 ([HS10J). Let M be a collection of quadratic functions satisfying that rank2(/) < r for 
all f € M U 2M (where 2M = {/1 + /2 : /i> fi € M}), then there is a subspace V of co-dimension 
at most Ar s.t. f\y is a linear function for all f G M. 
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Now we can prove the cubic polynomial case of Theorem [5j 

Proposition 25. For all function f : {0, l} n — > {0,1} with ¥2-degree 3, it holds that rank(/) = 
0(log 11/110 and thus D ffi (/) = 0(log ||/[|i). 

Proof. Note that Atf has F2-degree at most 2 for all t, and that in general Aj/ + A s / = &t+sf + 
A t A s f. Let M be the collection of {A t f : t G {0, 1}™}, then M satisfies the condition of Lemma 
[231 Furthermore, each Atf € M has 

rank(A 4 /) = log ||A7?||i + 1 < 2 log ||/]|i + 1, 

where the last inequality is because of Lemma [T5l Let r = 21og||/||i + 1. Now by Lemma [24], 
we know that 4r restrictions can make all Atf in M to become linear functions. Therefore there 
is a subspace of co-dimension at most 4r restricted on which Atf are linear functions, for all 
t € {0, l} n . This means that f\y has degree at most 2. It follows that rank(/) < 4r. The 
upper bound on D ffi (/) now follows by observing that D®(/) < rank(/) + D®(/'), where /' is a 
subfunction of / with F2-degree 2. Recall that for subfunctions we have ||/'||i < ||/||i, and hence 
D ffl (/') = 0(log[|/[|i + l) = 0(Iog||/[|i). D 

4.2 Constant-degree polynomials 

Now we will bound the rank(/) and use the Main PDT algorithm to bound the PDT complexity. 
Lemma 26. For all non-constant function f : {0, l} n —> {0, 1} of¥2-degree d, we have 

rank(/) < C e , mi „(/) < D e (/) < 0{2 d2 l\\og d - 2 \\f\\ x + 1)). 

Proof. We will prove by induction on degree d that 

rank(/) < C e , min (/) < max C^ min (/) < D e (/) < B d (||p||i), 

oe{o,ij 

where Srf(m) < 2 ' 2 log ~ m are a class of bounded non-decreasing (with respect to both d and 
argument m) functions to be determined later. The conclusion then follows from Eq.([3|). The case 
of d = 1 is trivial, the case d = 2 is easily handled by Theorem [23l and the case d = 3 is given by 
Proposition [25l 

Now suppose that the bound holds for all polynomials of F2-degree at most d— 1, and consider 
a function / of degree d > 4. We will first prove a bound for C© im i n (/), which also implies a bound 
on rank(/) from above by Corollary I20i 

First, it is not hard to see that there exists a direction t £ {0, l} n — {0 n } such that Atf is 
non-constant (unless / is a linear function, in which case the conclusion trivially holds anyway). 
Fix such a t. Since deg 2 (Ai/) < d — 1, by induction hypothesis, it holds that 

C^At/^B^CIKA^Il!). 
Define ft(x) = f(x + 1), then by Lemma fT5| we have 



||(A t /)±||i = ||/±-/ t ± ||i<||/±||i||/ t ± ||i = ||/±H? 



i- 
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which implies that 

C b eMn (A t f) < B^iWpWl). 

Since A t f is non-constant, C™ min (A t /) G [1,«] for both 6 = and 6 = 1. For each 6, by the 

definition of C^ min (A 4 /), there exists an affine subspace H^ with co-dim(i7{,) < S^-idl/^Hf) such 
that (Atf)\H b = b, which is equivalent to f(x) + f(x + i) = 6 for all x G i7&. 
Define 

flo(*) = ^(s) +m /±(x + *)), 5 i (x) = 5 (/*(*) -r /±(x + t)), 

where the plus +r and minus — r are over R. (To avoid potential confusions, in the rest of the 
proof, we will also use this notation for addition/subtraction of two functions over R.) 

These two functions have some nice properties. First, it is easy to see from the definition of go 
and g\ that f^ = go +r g±. Second, note that go and g\ are not Boolean functions any more; they 
take values in {— 1,0, +1}. However, a simple but crucial fact is that they take very special values 
on the affine subspace H;,: one always takes value 0, and the other always takes value in {+1, — 1}. 
Actually, it is not hard to verify that 

9b\u h = / ± |h 6 and gi-b\u b = 0. 
Third, in the Fourier domain, note that 

7t(s) = E^Oc + t)xs(x)} = E,[/±(x + t) Xs (x + t) Xs (t)} = f 1 (s) X s(t), 

and thus 

9b(s) = \(P(a) + R (-1) 6 ?(*)) = \(^{b) +r (-l) b Xs (t)P(s)). 

Therefore, we have 

JPOO set 1 - ^ Jo sei 1 

9o{s) = < -r, and #i(s) = < -^ — , 

where i 1 = {s 6 {0, l} n : (s,t) = 0}. Namely go and g\ each takes the Fourier spectrum f^ 1 on one 
of the two hyperplanes defined by the vector t. 
This further implies that 

ll/ ± lli = llsolli + llffilk- 

Thus, either ||<7o||i or ||gi||i is at most half of H/^li. Suppose that \\g~b\\i < lll/^Hi- We claim that 
restricting /^ to H^ reduces its spectral norm a lot. Indeed, since f ± \n b = 9b\H b > we have 

II/ ± |hJ|i = Hs&LffJi < \\9bh < 2II/II1' 

where the first inequality is because of Lemma PT6l To summarize, we have just shown that we can 

reduce the spectral norm by at least half using at most .Z?^ 1 C 1 1 ^ =l= Mi) linear restrictions. 

Now we recursively repeat the above process on the subfunction f^\H b until finally we find an 
affine subspace H s.t. H/^li/H < 1, at which moment the subfunction is either a constant or linear 
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function, thus at most one more folding would give a constant function. In total it takes at most 
-^d-idl/^lli) l°g ll/ ± l|i + 1 linear restrictions to get a constant function, which implies that 

C e , min (/)<iWllPlli)log||P||i + l- 
Next we will show that actually the max bG { ,i} C© min(/) ^ s n °t mucn larger either: 

max C| jinin (/) < B d -i(llPlll) log \\f±\\ 1 + B*_i(||p[|i) + 1. (7) 

06{0,1} 

(We need to show this because in the induction step, we picked one <#, with smaller spectral norm 
and used the induction hypothesis to upper bound C ffi min (A 4 /) for a particular b, which could be 
max 6e{o,i} C^ min (A^/).) Note that by the Main PDT algorithm, we know that 

De(/)<rank(/) + D e (/'), 

for a subfunction /' of / with deg 2 (/') < deg 2 (/). Now by Corollary [20l we can use C® jm i n (/) to 

upper bound rank(/). For the second part, since deg 2 (/') < deg 2 (/) and IK/O^Hl < ll/ 1 * 1 !!!; we 
can apply the induction hypothesis on /' to upper bound D e (/'). What we get here is 

D®(/) < B d _i(||/±||?) log IIPHj + 1 + S d _i(||/±||i). (8) 

Eq.fllJ) thus follows from the simple bound C e m j n (/) < C©(/) < D©(/). Now define the right-hand 
side of Eq.(j8j) to be Bdvll/^lli)) an d solve the recursive relation 

Bd(m) = B d ^i{m 2 )\ogm + B d ^i{m) + 1, B 3 (m) = (9(logm + 1), 

we get 

B d (m) = (1 + (i)) 2 ( d - 2 )( d - 3 )/ 2 \og d - 2 m, 

as desired. □ 

Note that in the above proof, it seems that we lose something by using C© im ; n to upper bound 
rank. However, it is crucial to consider the affine subspace Hf, on which At/ becomes a constant 
(instead of, say, a lower F 2 -degree polynomial), because otherwise <#, on Hf, is not equal to / 
(actually not even Boolean), and thus we cannot recursively apply the procedure on f\n b - In 
addition, if At/ is not constant on H^, then we cannot guarantee the decrease of the spectral norm 
due to restriction on Hf,. 

We have just showed that low degree polynomials have very small C© jm i n value in terms of the 
spectral norm. We actually conjecture that the bound can be improved to the following. 

Conjecture 27. There is some absolute constant c s.t. for any non-constant f : {0, 1}" — > {0, 1} ; 

C e ,min(/) = 0(log C ||/||l). 

It has the consequence as follows. 

Proposition 28. // Conjecture\27\is true, then for any f : {0, l} n — > {0,1}, rank(/) = 0(log c ||/||i) 
andD e (/) = 0(deg 2 (/)log c ||/|| 1 ). 
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In fact, we are not aware of any counterexample for Conjecture 1271 even for c = 1; see the last 
section for more discussions on this. 

Lemma [26] also implies the following Corollary, from which Corollary [7] immediately follows. 

Corollary 29. Iff : {0, l} n -»■ {0, 1} has¥ 2 -degree d, then f = J2j =1 ±lv { , where T = 2 2d2/2l °g d ~ 2 11/ 
and each ly. is the indicator function of the subspace V{. 

Proof. By Lemma l26j we know that the depth of the optimal PDT is at most 2 ' 2 log ~ ||/||i, 
and thus the size of the PDT is at most T. So the function can be written as the sum of at most T 
indicator functions 1# of affine subspaces. Then as argued in [G^S08j. each such indicator 1# can 
be written as ly x — ly 2 for two subspaces V\ and V 2 . The conclusion thus follows. D 

5 Functions with a small spectral norm 

We prove Lemma [301 in this section, which directly implies Lemma [U Theorem [9] and Theorem HOi 

Lemma 30. For all Boolean function f : {0, l} n — > {+1, —1}, we have C ffijm ; n (/) < 0(||/||i). 

Proof. Suppose that the nonzero Fourier coefficients are {f(a) : a £ A}, where A = supp(/). 
Denote by a±,a2, ■■■,a s the sequence of |/(a)| in the decreasing order, and the corresponding char- 
acters are Xan—>Xa a i n that order (thus |/(a«)| = a» and s = ||/||o is the Fourier sparsity of /). 
For simplicity, we assume s > 4, as doing so can only add at most a constant to our bound on 

Ce,min(/)- 

Consider the following greedy folding process: fold along f3 = ol\ + a 2 and select a proper half- 
space, namely impose a linear restriction xp{ x ) = b for some b £ {0, 1}, s.t. the subfunction has its 
largest Fourier coefficient being a\ + a 2 (in absolute value). This is achievable according to Lemma 

M 

We first show that at most 0(||/||i) greedy foldings can boost a±, the largest Fourier coefficient 
in absolute value, to at least 1/2. By Parseval's Identity, we have 

l-Ol = J^^ ~ a 2^Z ai = a 2(ll/lll ~ a l)- 
i>2 i>2 

So when ai < 1/2, the greedy folding increases the largest coefficient by 

l-o? 3 

a 2 > — — > 



l - oi 

Hence the largest coefficients would be larger than 1/2 in 0(||/||i) steps. (After one folding, the 
function becomes a subfunction of the previous one, but due to Lemma \TE\ the t\ norm of its 
Fourier spectrum only decreases. So we can safely use ||/||i as a universal upper bound for this 
sequence of subfunctions.) 

Next we show that greedy folding decreases the Fourier £i-norm by at least 2a\ = 2max s |/(s)|. 
Define 

P+03) = {(M) ■s + t = /3, f(s) ■ f(t) > 0} and P_(/3) = {(s,t) :s + t = P, f(s) ■ f(t) < 0}, 
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where all pairs (s, t) are unordered; same for the rest of the proof. By comparing the old and new 
Fourier spectra, we can easily see that the drop of Fourier ^i-norm is precisely 

2- Y, min{|/(a 4 )M/K)l}- 

(a,,a,)gP_(/3) 

Note that the folding is chosen such that the largest two Fourier coefficients have the same sign, so 
(01,02) £ -P+(/3). Next we will use the property that / is a Boolean function. By Proposition [HI 

£ ai + % =/3 /(ai)/(ay) = °> thus Ete.ayJeP+OS) a * a i = Efa^OeP-Ga) a * a i- Now we have 

01O2 < 2. a * a i = z^ a * a J — a3 z_^ min{aj,aj}. 
(i,i)eP+QS) (ij)6P_08) (ijJeP-OS) 

Therefore, the decrease of the Fourier ^i-norm is at least aia2 > 2a\. Thus once ai > 1/2, then 

each greedy folding decreases the Fourier ^i-norm by at least 1. So it takes at most ||/||i further 
steps to make the Fourier £i-norm to be at most 1, in which case at most one more folding makes 
the function constant. □ 

Lemma [30] implies that rank(/) < 0(||/||i) (Lemma | by Corollary [20] (that rank(/) < 

Ce,min(/))- 

Note that our Main PDT algorithm can be simply simulated by a protocol in which Alice and 
Bob send £i(x) and £i(y), respectively. Thus, similar to Fact [12] we have D (/) < 2D®(/) for 
/ : {0, l} n x {0, l}" 1 —> {0,1}. Theorem [9] basically follows from this lemma and the fact that 
subfunctions have smaller spectral norm (Lemma I16p . 

Lemma [30] also implies Theorem [TU] which asserts upper bounds on the deterministic commu- 
nication complexity of / o © as 



D CC (/o©) = 0(deg 2 (/) • H/lli) = 0(^rank(M /offi )logrank(M /offi ) 

To see this, first recall Theorem [221 which states that D cc (/ o ©) < 21og ||/|| • M(f) where M(/) 
is a downward non-increasing complexity measure. By Lemma [8] we can take M to be ||/||i. Now 

combining these with Fact [18] (that deg 2 (/) < log ||/||o), and the inequality that 
yields Theorem 1101 

6 Functions with a light Fourier tail 

First we will show that functions with low density can be computed efficiently by PDT. We will 
need a result by Chang [Cha02]. The following version is taken from a simplified proof in [IMR12]. 
Recall that for a function / : {0, l} n -> {0, 1}, its density is p x (f) = |/- 1 (1)|/2 ?1 . 

Lemma 31 ( |Cha02l IIMRT2] ). For all f : {0, 1}" -»■ {0, 1} and any e > 0, the set {s : \f(s)\ > e} 
spans a subspace of dimension less than d = 2( PlW ) ln(l/pi(f)). 

Another fact that we will need is the granularity of Boolean functions, first studied in [GOS + ll] . 

Definition 3. The (Fourier) granularity of a function f : {0, l} n — > {+1, —1}, denoted gran(/), 
is the minimum integer k s.t. all nonzero Fourier coefficients are integer multiples of ' 2~ k . 
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The following theorem relates granularity and sparsity. 

Lemma 32 ( [GOS+llp . Any Boolean function f : {0,1}™ -)■ {+1,-1} with ||/|| > 2, has 
gran(/) < Llog 2 H/HoJ - 1. 

Now we can show the lemma for low-density functions. 

Lemma 33. For all f : {0, l} n ->■ {0, 1} with Pl (f) = polyl °?y" llo) 7 D e (/) < log° (1) (||/||o)- Thus 
the Log-rank Conjecture is true for f o ©. 

For completeness, we give a self-contained proof (without resorting to [IMR12] ) of this lemma 
using Beckner-Bonami inequality in Appendix [B] with a slightly worse parameter. 

Proof. Suppose that pi(f) = log c ||/||o/||/||o- By Lemma l32l the minimum Fourier coefficient (in 
absolute value) is at least 2/||/||o. Take e as this value, and apply Lemma I3TI we know that all 
nonzero Fourier coefficients are in a subspace of dimension 

O(( Pl (/)||/|| ) 2 ln(l/pi(/))) = 0(log 2c+1 H/llo). 

This implies that there exists an invertible linear transformation L such that all the non-zero Fourier 
coefficients of / o L lie in a subspace of dimension d = 0(log c+ ||/||o)- By choosing the basis 
appropriately, we may assume, without loss of generality, that the subspace is just {0, l} d x n ~ d . 
Thus a decision tree algorithm for foL can simply query these bits. Therefore, D©(/) < D(foL) < 
log° (1) (ll/llo)- □ 



The last lemma we need is the following result by Gopalan et al. |GOS + ll . Recall that 



a. 



function / : {0, l} n — > {+1,-1} is /i-close to s-sparse in £2 if Yli> s f( s i) 2 — I^i where |/(si)| > 
• •• > I/( s at)|- We say two functions f,g : {0, 1}™ — > {+1, —1} are e-close if Pr x [f(x) / g(x)] < e. 

Lemma 34 ( [GQS + ll] ). If f : {0, l} n — > {+1,-1} is [i-close to s-sparse in £2, where [i < 2^1, 
then f is /U 2 /2-close to a Boolean function g : {0, l} n —> {+1, —1} of Fourier sparsity s. 

Putting these results together, we can prove Theorem II li 

Theorem 1111 (Restated). If f : {0, l} n — > {+1,-1} is fx-close to s-sparse in £2, where \i < 
l0g ° / ^" and s < log ^) H/llo, then D e (/) < log ^) ||/|| . 

Proof Since / is /i-close to s-sparse, and 20s 2 \i = og , " "° < 1 for sufficiently large ||/||o, therefore 

Vll/llo 
by Lemma [341 / is /x 2 /2-close to a Boolean function g of Fourier sparsity s. We will compute / by 

computing g and fg. By the setting of parameter fi, it holds that ^ 2 /2 < og -.. , and hence 

ll/llo 

P-iifg) < og -I, • Note that p_i in the {+1, — l}-range representation is just the same as p\ in 

ll/llo ^ 

the {0, l}-range representation. Applying Lemma [T5t we have ||/<?||o < l|/l|o[|5[|o < ll/llo ■ s. Now 
by Lemma [33| we see that the Boolean function fg can be computed using log ' ' ||/||o queries. 
To compute g itself, we can just use the trivial upper bound of D ffi (/) < ||^||o = s = log ^ ' ||/||o- 
Thus 

D e (/)<D ffi ( 9 ) + D e (/ 5 )<log°« 



as desired. D 

18 



7 Concluding remarks 

The major open question is to prove Conjecture [2] rank(/) = O(log c (||/||o)), or even the stronger 
Conjecture [271 Ce,min(/) = 0(log c ||/||i + 1). In general, the gap between ||/||o and ||/||i can be 
huge. For instance, the AND function of n variables have ||/||o = 2 n and ||/||i = 0(1). Thus 
one may think that Conjecture [27] is probably too strong to hold. However, note that the AND 
function has a large F2-degree, and Fourier sparse functions always have F2-degree smaller than 
log H/llo- We actually do not know any counterexample for Conjecture [27] even for c = 1. Indeed, 
we can show that Conjecture 1271 actually holds for several classes of functions, where for symmetric 
functions we use a result from [AFH12] . 



Proposition 35. Conjecture 21 is true with c = 1 for affine subspace indicators 1h, degree-d bent 



functions x\...Xd H + x n -d+i---x n and all symmetric functions. 

Note that if C ffi>min (/) = 0(log||/||i) is true, then we not only have D cc (/oe) = 0(log 2 rank(M /oe )), 
but also further improve Green-Sander's result to T = ||/||f; see Proposition 1281 and Corollary 1291 

In the upper bound in Lemma l26l the 2 ' 2 factor comes from the fact that ||Aj/||i < ||/||f. 
As we have the freedom of picking any t, is it possible that, for any Boolean function /, one can 
always find a t such that the Fourier sparsity of its derivative ||At/||i is much smaller than the 
trivial upper bound ||/||f? 
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A Restriction of functions on affine subspace: Proof of Lemma [16 



Proof, (of Lemma [TBI) We will show the conclusion for affine subspaces of co-dimension 1, and the 
second and third conclusion for the general H follows by repeatedly applying the result. When 
co-dim(H) = 1, namely co-dim(y) = 1, there is a unique non-zero vector t E {0,1}™ orthogonal 
to all vectors in V. Take a basis r , ...,r™ -1 of V, and further take a vector r n £ V . Define an 
n x n matrix R = [r 1 , ...,r n ], then Rf(y) = f(Ry) = fiyif 1 + • • • + y n rn )- Define two functions 
A, A : {0, 1}™" 1 -)• K by f b (y) = Rf(yb), namely ' 

Hyi...y n -i) = f(yir l + ---+y n -ir n - 1 ) and f 1 (y l ...y n „ l ) = f(y 1 r l + ... + y n ^ 1 r n - l + r n ). (9) 

Since A is defined on {0,1}"", its Fourier spectrum can be defined as before, and we will use it 
as the Fourier spectrum of f\n- Now we will prove that this choice of definition satisfies the 
three conditions. Note that though the choice of r , ...,r n is not unique, but the vector of Fourier 
coefficients are the same up to a permutation, and in particular, its £ p -norm does not depend on 
the choice of R. 

1. Let us first compute the Fourier coefficients of the function Rf : {0, 1}" — > R. It is not hard 
to see that for any s £ {0, l} n_1 , we have 

Rf(s0) = i(/o(s) + A(*)) and Rf(sl) = ~(f (s) - A (a)). 

This implies that 

f (s) = f({R T )-\s0)) + f((R T )-\sl)) and Ato = K(R T rHs0)) - f((R T r\sl)), 

where we used the fact that Rf(s) = f((R)~ 1 s) for any invertible linear transformation R. 
So in either subfunction, the pair of Fourier coefficients of / that collide are 

{(/(( J R T )- 1 ( S 0)), f((R T rHsl))) : s e {0,l} n - 1 }. 

To see the relation of these two characters, suppose that the rows of L = R^ 1 are I , ■■■,l n , 
then by LR = I, we know that (l n ,r l ) = ... = (/ n ,r" _1 ) = 0. Since there is only one nonzero 
vector, t, orthogonal to all r 1 , ..., r n_1 , therefore l n = t, and thus (R T )~ 1 (s0)+(R T )~ 1 (si) = t. 
So the pairs are just those (s, s + t). 

2. Since the Fourier spectrum of A is formed by pairing up (using plus or minus) the Fourier 
spectrum of /, by the standard fact that |o| p + \b\ p > \a + b\ p for any p £ [0, 1], we know that 

IIAIIp < [|/[|„. 
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3. First, for Boolean function /& : {0, l}™ -1 — > {+1,-1}, we know that fb(y) = ±Xs(y) <=> 
\\fb\\o = 1 by definition of ^o-norm. Since 1 = 1 1 jT^ 1 1 2 < ||/fe||i < 1 1 Alio by Parseval's Identity, it 
is easily seen that ||/&||i = 1 is equivalent to that there is only one nonzero Fourier coefficient. 

The conclusion now follows by noting that fb is a linear function if and only if /|#(x) = 
R fb{x) = fb(R x) is a linear function. 

□ 



B A proof of Lemma 133 



For any < r\ < 1 we can define a linear operator T ri : O ' > — > O ' ' such that, for any 
/ : {0, l} n — > C with Fourier expansion f(x) = J2te{o i| n f(^)Xt(x), T v (f) is a complex-valued 
function over the Boolean cube such that for every x £ {0, l} n , 

r,(/)(x)= £ HtW tl xt(x). 

te{o,i} n 

The following remarkable theorem |Bon70l IBec75j shows that T v is a norm-1 operator from 
L 1+ " 2 ({0,l} n )toL 2 ({0,l} n ). 

Theorem 36 (Bonami-Beckner Theorem). Let f : {0, l} n — > C be a function defined over the 
Boolean cube. Then for every < n < 1, 

I|r„/||2<||/||i4v 

Lemma 37. Let f : {0, l} n —> {0, 1} be a Boolean function with Fourier sparsity s and density 
Pi(f) = 0(polylogs)/s. Then there exists an invertible linear map L : {0, l} n —> {0, l} n such that 
all the non-zero Fourier coefficients of Lf lie in a subspace of dimension d = O(polylogs). 

Proof. Let A C {0, l} n be the set of vectors in {0, l} n at which the Fourier coefficients of / are 
non-zero. Suppose d = dim(Span(A)). Let £1, . . . £d be a set of d linearly independent vectors in 
A. Let L : {0, 1}" —> {0, 1}™ be an invertible linear map. If we define a new Boolean function 
/' : {0, l} n -»• {0, 1} such that /' := Lf, then it can be readily verified that /'(a) = }({L- l ) T a) for 
every a £ {0, l} n . Therefore by choosing L appropriately and replacing / with /' we may assume 
that £1, . . . , £d are the standard basis e±, . . . , e<j. Recall that for every a S A, |/'(a)| > 1/s, then 
for any < n < 1, we have 

d^?<JZW , {e l )\ 2 = JZ^\f'^\ 2 

i=l i=l 

teA 



Eivwi 



2 



teA 



< J2\f^f'{t)\ 2 = \\T v f'\\l (Parseval's Identity) 



t 

11+^2 



< ll/'IlL^ (Theorem [36 



Pi(/')^ 
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where in the last step we make use of the fact that /' is a Boolean function. Now taking rj = w °^°^ s 
gives d < O(polylogs). D 
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